Capstone Project - The Battle of neighbourhoods (Week-1)

🏨 Bed and Breakfast Location in Brooklyn NY, Barclay Center as epicenter

Applied Data Science Capstone IBM-Coursera

Table of content

Week - 1

✔ Introduction: Business Problem

✔ Data

Week - 2

✔ Methodology and Data Analysis

✔ Results and Discussion

✔ Conclusion

Introduction: Business Problem

New York always being a fascinating city with many visitors year-round, making tourism a thriving business. Yet Manhattan is still the center or visitors’ people the cost of hotels in Manhattan are quite expensive.

Brooklyn have become trendy with arts and music scene due to the influence of different cultures specially black communities yet some of them are suffering gentrification with the new investments in infrastructure the proximity to Manhattan and access to subway a cheap means of transport it is ideal place to set trendy Bed and Breakfast turning as an example a Brownstone 4 story home into a good business for locals and families instead of rentals.

The idea is to provide a possible investor with the best possible information to decide based on requirements:

  1. 1.6km radius from Barclay's Center

  2. Not too crowded with B&B or similar business

  3. With variety of venues such as Cafe's, Restaurants and Pubs.

Non requirements but will give the exercise an added Value.

  1. Make a competitive analysis: if possible, get the description of the competitors on Sq-meters/sq-foot size building.

  2. Check the area under study for properties with similar size and its price

  3. Select a few properties and check its close neighborhood for venues and amenities.

  4. Make a recommendation

We will use the skills and tools learned in IBM-Coursera Specialization to complete with the requirements.

Data

With the problem in hand, we need the following information:

  1. Identify the B&B plus similar business in a radius of 1.6km surrounding the Barclay's Center.

     ✔ This will fill 1st and 2nd requirement
  2. Identify trendy areas with cafe's, restaurants etc. close to Barclay's Center
     ✔ This will put neighborhoods to "compete" for the investment fulfilling 3rd requirement

Additional information:

  1. Identify homes/properties for sale with similar size or with a minimum of 6-8 bedrooms
     ✔ Gather Address, Square meter, features and price and Realtor contact.

Sources and tools:

  • Geolocation data for Barclay Center
  • Foursquare database for B&B and venue information in the range of study.
  • Zillow.com for Real Estate Data

Methodology & Data Analysis

  1. Import several Libraries and tools, might need a clean up
  2. Explore the location for Bed&Breakfast, Hostel around Barclay center 1.6km (1 mile) to make it a walking distance.
  3. In addition gather other venues, observation 1 B&B was not captured in the step one yet on all venues it did show up.
  • Here you can see that the Foursquare provided with information of other type or venues and bus stations etc
In [405]:
# reduce it to print out to just 10 items remove head to see all 138 venues

dataframe_filtered_bnb_venues.name.head(10)
Out[405]:
0       Regina's New York Bed and Breakfast
1              Garden Green Bed & Breakfast
2                      3B Bed And Breakfast
3             Bed And Breakfast On The Park
4                   Imhotep Bed & Breakfast
5                  Sterling Bed & Breakfast
6                          Bed Bath & Linen
7    Bed Stuy Acupuncture & Massage Therapy
8             Bed Stay food and meat center
9                 Bedford & Putnam Antiques
Name: name, dtype: object
  • Need to clean up the data we ended with 10 venues ( 7 B&B and 4 Hostels)
  • Also here we can see that there are some business with no address information we need to fix that!!
In [408]:
### need to filter because there are businesses and bus stops that are not part of what i need to analyze
categ_filter = ['Hostel', 'Bed & Breakfast']
dataframe_filtered_bnb = dataframe_filtered_bnb_venues[dataframe_filtered_bnb_venues.categories.isin(categ_filter)]
dataframe_filtered_bnb = dataframe_filtered_bnb.reset_index(drop=True)
dataframe_filtered_bnb
Out[408]:
name categories address neighbourhood distance lat lng postalCode cc city state country formattedAddress labeledLatLngs crossStreet id
0 Regina's New York Bed and Breakfast Bed & Breakfast 16 Fort Greene Pl NaN 760 40.689286 -73.977212 11217 US Brooklyn NY United States [16 Fort Greene Pl, Brooklyn, NY 11217, United... [{'label': 'display', 'lat': 40.68928599999999... NaN 4cbb751da33bb1f7c76f94fd
1 Garden Green Bed & Breakfast Bed & Breakfast 641 Carlton Ave NaN 603 40.677818 -73.971934 11238 US Brooklyn NY United States [641 Carlton Ave, Brooklyn, NY 11238, United S... [{'label': 'display', 'lat': 40.677818, 'lng':... NaN 57745c33498ee02504f013aa
2 3B Bed And Breakfast Bed & Breakfast NaN NaN 1415 40.692115 -73.986426 NaN US NaN New York United States [New York, United States] [{'label': 'display', 'lat': 40.69211532779243... NaN 4d913e56939e54816ae8c99e
3 Bed And Breakfast On The Park Bed & Breakfast 113 Prospect Park W NaN 1825 40.666222 -73.975874 11215 US Brooklyn NY United States [113 Prospect Park W (7th), Brooklyn, NY 11215... [{'label': 'display', 'lat': 40.666222, 'lng':... 7th 4ba18a03f964a52019bf37e3
4 Imhotep Bed & Breakfast Bed & Breakfast 1070 Bedford Ave NaN 1813 40.688222 -73.955108 11216 US Brooklyn NY United States [1070 Bedford Ave (Greene Avenue), Brooklyn, N... [{'label': 'display', 'lat': 40.68822199999999... Greene Avenue 4b9c3d7ff964a5200e5836e3
5 Sterling Bed & Breakfast Bed & Breakfast 686 Sterling Pl NaN 1959 40.672876 -73.955937 11216 US Brooklyn NY United States [686 Sterling Pl, Brooklyn, NY 11216, United S... [{'label': 'display', 'lat': 40.672876, 'lng':... NaN 4cb7925d4c60a093988832ca
6 Hostel Hostel 32 Kosciusco St NaN 1987 40.690383 -73.954084 15233 US New York NY United States [32 Kosciusco St, New York, NY 15233, United S... [{'label': 'display', 'lat': 40.69038330685645... NaN 4e3154892271cfd3d049aff8
7 Lafayette International Hostel Hostel 484 Lafayette Ave NaN 1800 40.689197 -73.955798 11205 US Brooklyn NY United States [484 Lafayette Ave (Bedford Avenue), Brooklyn,... [{'label': 'display', 'lat': 40.68919658660889... Bedford Avenue 4c536ec36a4bb7132e270c27
8 Dekalb Hostel Hostel NaN NaN 2039 40.690748 -73.953633 11205 US Brooklyn NY United States [Brooklyn, NY 11205, United States] [{'label': 'display', 'lat': 40.690748, 'lng':... NaN 50208248e4b08241e0a46419
9 Esperanto Hostel Hostel NaN NaN 2039 40.690823 -73.953677 NaN US Brooklyn NY United States [Brooklyn, NY, United States] [{'label': 'display', 'lat': 40.69082269597369... NaN 51847d39498ef45e54f12e1c
10 Baba Yaga's Hut Bed & Breakfast 472 Bergen NaN 59 40.682090 -73.975424 11217 US Brooklyn NY United States [472 Bergen (Flatbush), Brooklyn, NY 11217, Un... [{'label': 'display', 'lat': 40.68209011433128... Flatbush 4d28f3edc406721e409576b6
  • Above we can see the 11 venues of interest yet with missing information on the neighbourhood
  • Fixing missing address with reverse geocoding
  • Here the data frame contains a valid address, neighbourhood and zip address with the distance from the Barclay Center
  • Now we have the right address information
  • Function that use geo location to generate an address for the missing fields

>> Here we can see the position relative to the Barclay Center

  • This image below is in case of GITHub not allowing the map above to show

Barclay_BnB_map1.png

>> Start of Neighbourhood analysis

In [424]:
#Brooklyn neighbourhoods of interes with geo-coord.
brook_neigh
Out[424]:
House_number_name road neighbourhood suburb county city state postcode country Lat Long
0 32 Underhill Avenue Prospect Heights Brooklyn Kings County New York New York 11238 United States of America 40.68 -73.9651
1 177 Greene Avenue Fort Greene Brooklyn Kings County New York New York 11238 United States of America 40.687 -73.9649
2 303 Vanderbilt Avenue Clinton Hill Brooklyn Kings County New York New York 11205 United States of America 40.689 -73.9688
3 148 Sterling Place Park Slope Brooklyn Kings County New York New York 11217 United States of America 40.6768 -73.9733
4 126 4th Avenue Gowanus Brooklyn Kings County New York New York 11217 United States of America 40.6805 -73.9813
5 345 Dean Street Boerum Hill Brooklyn Kings County New York New York 11217 United States of America 40.6837 -73.9803

>> Now lets get the venues in these 6 neighbourhoods and check what is the most common

  • Cluster Map for the Brooklyn neighbourhoods of interest.

Barclay_Cluster_map2.png

In [447]:
brook_merged_short.loc[brook_merged_short['Cluster Labels'] == 0, brook_merged_short.columns[[0] + list(range(5, brook_merged_short.shape[1]))]]
Out[447]:
neighbourhood Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
0 Prospect Heights 0 Bar Wine Shop Mexican Restaurant Bakery Café Cocktail Bar New American Restaurant Thai Restaurant Coffee Shop Yoga Studio
4 Gowanus 0 Bar Coffee Shop Pizza Place Dessert Shop Yoga Studio Deli / Bodega Gym Mexican Restaurant Taco Place Bakery
5 Boerum Hill 0 Bar Bakery Coffee Shop Dessert Shop Pizza Place Dance Studio Cosmetics Shop Lounge Yoga Studio Japanese Restaurant
In [448]:
brook_merged_short.loc[brook_merged_short['Cluster Labels'] == 1, brook_merged_short.columns[[0] + list(range(5, brook_merged_short.shape[1]))]]
Out[448]:
neighbourhood Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
1 Fort Greene 1 Pizza Place Café Italian Restaurant Wine Shop Bakery Cocktail Bar Bar New American Restaurant Deli / Bodega Indian Restaurant
2 Clinton Hill 1 Italian Restaurant Playground Cocktail Bar Yoga Studio Thai Restaurant New American Restaurant Japanese Restaurant Pizza Place Café Diner
In [449]:
brook_merged_short.loc[brook_merged_short['Cluster Labels'] == 2, brook_merged_short.columns[[0] + list(range(5, brook_merged_short.shape[1]))]]
Out[449]:
neighbourhood Cluster Labels 1st Most Common Venue 2nd Most Common Venue 3rd Most Common Venue 4th Most Common Venue 5th Most Common Venue 6th Most Common Venue 7th Most Common Venue 8th Most Common Venue 9th Most Common Venue 10th Most Common Venue
3 Park Slope 2 Gym / Fitness Center Coffee Shop American Restaurant Bagel Shop Pizza Place Sushi Restaurant New American Restaurant Grocery Store Gourmet Shop Bakery
  • 3 groups of clusters help identify similarities between neighbourhoods as diferences.
  • Cluster 0 is where we will focus on identifying possible properties for the B&B
  • Barclay Center + Neighbourhood Clusters + BnB in the area.
  • Barclay Center:Black
  • Clusters: 0-Red, 1-Purple, 2-Cyan
  • Bnb: Blue dots

Barclay_Cluster_BnB_map3.png

  • Here it we can see that the BnB are pretty much spaced out or not too close to the Barclay Center, which can be also due to cheaper cost on the real estate.
  • In addition you can see that 2 neighbourhoods does not have a BnB that is Boerum Hill & Gowanus
  • We can use those locations to find properties and compare them against Clinton Hill and Bed-Stuy
  • As we can see there are 3 zip codes we can use for Zillow information gathering, we will focus on 2
In [129]:
brook_merged_short[['neighbourhood', 'postcode', 'Cluster Labels']]
Out[129]:
neighbourhood postcode Cluster Labels
0 Prospect Heights 11238 0
1 Fort Greene 11238 1
2 Clinton Hill 11205 1
3 Park Slope 11217 2
4 Gowanus 11217 0
5 Boerum Hill 11217 0
In [454]:
dataframe_filtered_bnb['postalCode'].value_counts()
Out[454]:
11205    4
11217    2
11216    2
11201    1
11238    1
11215    1
Name: postalCode, dtype: int64

Real Estate Data comparing 2 zip codes 11205 11217

  • The zillow was webscrapped and organized into csv files
  • There was incomplete data and foreclousure properties that are not taken for this analysis
In [456]:
brook_11205_zillow.head()
Out[456]:
type_of_property address city state postal_code price beds bath area_sqft real_estate_provider webpage
0 Apartment for sale 35 Cumberland St Brooklyn NY 11205 Brooklyn NY 11205 1075000 4 4 2520 Keller Williams Realty Empire https://www.zillow.com/homedetails/35-Cumberla...
1 Apartment for sale 197 Waverly Ave Brooklyn NY 11205 Brooklyn NY 11205 4500000 3 3 4500 Corcoran https://www.zillow.com/homedetails/197-Waverly...
2 Co-op for sale 185 Clinton Ave APT 1A Brooklyn NY 11205 Brooklyn NY 11205 449000 1 1 651 Brown Harris Stevens https://www.zillow.com/homedetails/185-Clinton...
3 Co-op for sale 205 Clinton Ave 12A Brooklyn NY 11205 Brooklyn NY 11205 730000 2 1 750 AP Realty Group NY https://www.zillow.com/homedetails/205-Clinton...
4 Townhouse for sale 280 Washington Ave Brooklyn NY 11205 Brooklyn NY 11205 9200000 7 8 10000 Corcoran https://www.zillow.com/homedetails/280-Washing...
In [457]:
brook_11217_zillow.head()
Out[457]:
type_of_property address city state postal_code price beds bath area_sqft real estate provider webpage
0 Townhouse for sale 134 Douglass St Brooklyn NY 11217 Brooklyn NY 11217 1688000 3 3 1800 Brownstone Real Estate https://www.zillow.com/homedetails/134-Douglas...
1 Townhouse for sale 471 State St Brooklyn NY 11217 Brooklyn NY 11217 5880000 4 3 4800 Corcoran https://www.zillow.com/homedetails/471-State-S...
2 House for sale 347 State St Brooklyn NY 11217 Brooklyn NY 11217 3795000 3 3 3757 Corcoran - MH Soho https://www.zillow.com/homedetails/347-State-S...
3 Apartment for sale 205 Berkeley Pl Brooklyn NY 11217 Brooklyn NY 11217 3850000 5 4 4500 NaN https://www.zillow.com/homedetails/205-Berkele...
4 House for sale 217 Berkeley Pl Brooklyn NY 11217 Brooklyn NY 11217 5250000 6 5 5415 Compass https://www.zillow.com/homedetails/217-Berkele...
  • For the purpose of a BnB we are looking for large properties and also with good amount of bedrooms.
  • Let's take a look at the distribution of key feautures bed
In [465]:
feat_data_11205 = brook_11205_zillow[['type_of_property','price','beds','bath','area_sqft']]
feat_data_11205.hist()
plt.show()
In [466]:
feat_data_11217 = brook_11217_zillow[['type_of_property','price','beds','bath','area_sqft']]
feat_data_11217.hist(color = "green")
plt.show()
Will perform analysis on 11205 1st
In [484]:
#plotting results

plt.scatter(train.price,train.area_sqft, color='blue')
plt.plot(train_x, regr.coef_[0][0]*train_x + regr.intercept_[0], '-r')
plt.xlabel("price")
plt.ylabel('square_ft')
plt.show()
In [486]:
feat_data_11205.corr()['price'].sort_values()
Out[486]:
beds         0.599697
bath         0.632013
area_sqft    0.934563
price        1.000000
Name: price, dtype: float64
  • The above shows which feature correlates better with the price
  • As expected the price increase as the area of the property
Will perform analysis on 11217
In [496]:
#plotting results

plt.scatter(train.price,train.area_sqft, color='green')
plt.plot(train_x, regr.coef_[0][0]*train_x + regr.intercept_[0], '-r')
plt.xlabel("price")
plt.ylabel('square_ft')
plt.show()
In [498]:
feat_data_11217.corr()['price'].sort_values()
Out[498]:
bath         0.678647
beds         0.684959
area_sqft    0.870823
price        1.000000
Name: price, dtype: float64
  • As expected the area has a better correlation with price.
  • other factors in this zip code are a bit higher thank in 11205
  • Below we get the mean values of key features in the data frame
  • This will aid on the recommendation provided
In [499]:
feat_data_11217.groupby(['type_of_property']).mean()
Out[499]:
price beds bath area_sqft
type_of_property
Apartment for sale 3.301000e+06 6.285714 4.333333 3493.952381
Co-op for sale 2.021714e+06 2.142857 2.142857 1968.000000
Condo for sale 1.591626e+06 2.000000 1.971429 1243.714286
For sale by owner 6.000000e+06 9.000000 4.000000 3432.000000
House for sale 3.773167e+06 4.333333 3.833333 3352.333333
New construction 2.599000e+06 4.000000 3.000000 1806.000000
Townhouse for sale 3.126562e+06 3.250000 2.625000 3244.500000
In [500]:
feat_data_11217.groupby(['type_of_property']).mean()['price'].plot.bar(figsize=(12,7), color ='green')
Out[500]:
<matplotlib.axes._subplots.AxesSubplot at 0x2cd054c2700>
In [501]:
feat_data_11205.groupby(['type_of_property']).mean()
Out[501]:
price beds bath area_sqft
type_of_property
Apartment for sale 2.656929e+06 5.214286 4.285714 3431.071429
Co-op for sale 6.505833e+05 1.166667 1.166667 786.833333
Condo for sale 8.598621e+05 1.275862 1.448276 873.551724
For sale by owner 5.250000e+05 1.000000 1.000000 620.000000
House for sale 2.261998e+06 3.285714 2.571429 2844.571429
Lot / Land for sale 7.990000e+05 0.000000 0.000000 1776.000000
Townhouse for sale 5.500000e+06 6.000000 5.666667 5576.666667
In [503]:
feat_data_11205.groupby(['type_of_property']).mean()['price'].plot.bar(figsize=(12,7))
Out[503]:
<matplotlib.axes._subplots.AxesSubplot at 0x2cd0572f1f0>
  • Looking at the information of 11217 we can see that "House for Sale" & "Townhouse for sale" are the best option
  • with regards of size and price apartments are not suitable for a business
  • With that said we are going to filter out the needed information of properties on sale and lcoate them on the map.
  • removing properties smaller than 2500sqft and properties beyond 4 million
In [507]:
brook_11217_zillow_filtered
Out[507]:
type_of_property address city state postal_code price beds bath area_sqft real estate provider webpage
0 House for sale 347 State St Brooklyn NY 11217 Brooklyn NY 11217 3795000 3 3 3757 Corcoran - MH Soho https://www.zillow.com/homedetails/347-State-S...
1 House for sale 279 Wyckoff St Brooklyn NY 11217 Brooklyn NY 11217 3999000 4 4 3200 NaN https://www.zillow.com/homedetails/279-Wyckoff...
2 House for sale 446 State St Brooklyn NY 11217 Brooklyn NY 11217 2950000 4 4 2816 NaN https://www.zillow.com/homedetails/446-State-S...
3 Townhouse for sale 30 Saint Johns Pl Brooklyn NY 11217 Brooklyn NY 11217 2999500 6 3 3556 Engel & Völkers New York Real Estate https://www.zillow.com/homedetails/30-Saint-Jo...
4 Townhouse for sale 119 Lincoln Pl Brooklyn NY 11217 Brooklyn NY 11217 3350000 4 3 3200 Compass https://www.zillow.com/homedetails/119-Lincoln...
5 Townhouse for sale 187 6th Ave Brooklyn NY 11217 Brooklyn NY 11217 3500000 5 4 3600 John A Maguire Real Estate LLC https://www.zillow.com/homedetails/187-6th-Ave...
6 Townhouse for sale 160 S Portland Ave Brooklyn NY 11217 Brooklyn NY 11217 2950000 4 4 3900 NaN https://www.zillow.com/homedetails/160-S-Portl...
7 House for sale 76 Nevins St Brooklyn NY 11217 Brooklyn NY 11217 3150000 4 3 2926 NaN https://www.zillow.com/homedetails/76-Nevins-S...
8 Townhouse for sale 602 Pacific St Brooklyn NY 11217 Brooklyn NY 11217 2650000 0 1 2700 Avenue Sotheby's International Realty https://www.zillow.com/homedetails/602-Pacific...

>> Map that includes the Barclay Center, the BnB sorrounding and the properties we picked to suggest for a possible BnB

  • Barclay Center: Black
  • BnB : Green
  • Properties on Sale: Blue+Cloud, the Pop_Up include the property Link page on Zillow

Barclay_BnB_property_on_sale_map4.png

Results and Discussion

Brooklyn one of the 5 Boroughs of New York City , which has risen to prominence due to investments in the real estate sector and quick transport to Manhattan has become a trendy place for visitors from all over the world, we took the task to find a suitable list of properties that are located close to the Barclay center within 1.6km, with venues of interest and not crowded with BnB/Hostels.

The result of the initial analysis yielded a list of 11 B&B nearby, but mostly concentrated in the Clinton Hill-Bed-Stuy neighborhoods, which gave room to look into adjacent neighborhoods to the Barclay Center in the opposite direction and closer to Manhattan as well close to sites of interest such as museums and parks.

Once identified K-Means technique was performed to find similarities & differences between the neighborhoods, this indeed help solidify the idea of looking for properties in Cluster-0 zip code 11217, their neighborhoods offer similar venues such as Bar's, Cafe's and international restaurants to add into the visitor experience.

The web scrapping on Zillow yielded a list of 100+ for sale properties on each of the 2 Postal Codes under analysis, the 11205 (Clinton hill) and the 11217 (Gowanus, Boerum, Park Slope). Data needed some cleaning and removing outliers with wrong square-foot-area or missing information. The linear regression analysis with remaining data validate the veracity of it, since as square-feet-area increases so the price, once outliers and missing data is cleared out, this gives a good confidence to the business and decision making.

Excluded ultra-expensive and small properties out of the recommendation since it is not suitable for the proposed business. The resulting properties share similar features such as area, beds and price.

With all above completed the result is a recommendation of 9 properties that would be a good fit for a B&B business close to the Barclay Center, the pop-up in the map will take you to the Real-Estate Page of the property

Conclusion

  • Brooklyn is a vibrant place with lots of venues and multicultural heritage, yet it seems that there are room for Bed&Breakfast business to be developed providing an accessible hospitality option for younger travelers and budget mindful, who spend most of the day visiting the city more than enjoying the hotel/B&B premises.

  • Using different Data Analysis techniques, it was possible to create a reasonable business proposal and a recommendation to the problem at hand.

  • K-Means and regression are tools that help tell a story and validate the data to provide an accurate picture of the environment under study.

  • Tools like this can help people make the best decision for their business.

The techniques, tools and methods learned in the Coursera-IBM help on a possible real-life scenario to generate data and a story to tackle the problem proposed. This exercise could be extended and improved which is part of the journey of becoming a Data Scientist. Very good course, I started from zero and i was able to understand many concepts that are useful in my day to day work.